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ABSTRACT 



Two criteria for a phonemic notation— assignment of 
symbols to sounds and the influence of purpose— were discussed. Also 
presented were three purposes of a phonemic notation: (1) as an 

initial teaching medium, (2) for an intermediate stage or stages of 
phonemic spelling reform of English, and (3) for an ultimate 
phonemics spelling reform. Data of relative frequency of phonemes 
and/or graphemes were viewed as having importance in (1) devising 
phonemic codes and in formulating rules, (2^ assessing compatibility 
with traditional orthography, (3) estimating the possible savings in 
the writing and printing of superfluous letters, and (4) 
standardizing a type of notation. Examples were given which 
illustrate the use of data on relative frequency in making decisions 
related to standardizing the type of notation. Tables arid~ref erences 
are included. (DH) 
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Relative frequency of occurrence 

as a factor in the phonemic and grapheme problems of written English 



Godfrey Dewey 

(IRA-SSA meeting, Anaheim, Thursday, May 7, 1970) 



Strictly speaking, the announced title of this paper should have included 
one more word, referring specifically to problems of written English. The 



English language which we speak is no more affected by whether it is recorded 
graphemically in shorthand or longhand, typing or print, than by whether it is 
recorded acoustically on a cylinder-; a disk, or a tape. The very first sen- 
tence of the classic "Principls of ’76" (I retain the original, spelling) pro- 
mulgated by the American Philological Association in 1876 was: 

The true and sole office of alfabetic writing is faithfully 
ard’intelligibly to represent spoken speech.^ - 
and it is with the written representation that we are here chiefly concerned. 
Criteria for a phonemic notation of whatever type may be grouped in four 

main categories: sounds, symbols, assignment of symbols to sounds, and the 

2 ' 

influence of purpose*. For each of these categories, statistics on relative 

frequency of phonemes and/or graphemes are significant in varying degree. In 

the limited time available for this paper, the first two will have to taken for 

granted; assuming substantially the phonemic basis of i.t.a. and TIES (World 

English Spelling), and the graphemic basis of TIES ,> either of v/hich would be 

the 

a major topic in itself, and confining our examination to/third, assignment 
of symbols to sounds, as modified by the fourth, the influence of purpose. 

Data on relative frequencies here cited are, unless otherwise specified, 
taken from my studies of phonemes^ and graphemes , both based on exhaustive 
analysis of the same 100,000 words of well-diversified connected matter, on a 
41 “phonume basis (counting schwa), virtually equivalent to the phonemic basis 
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of i.t.a., Complete data on occurrences and items are stated usually in the 
form x/y, where x equals the total of occurrences on the printed page, and 
y equals the number of items (different words or syllables) involved — per 
100,000 running words always understood. In general, data on occurrences are 
more significant for reading, data on items more significant for writing, i.e., 
spelling. 

Statistics, however carefully compiled, are chiefly valuable as an aid 
to common sense, not as a substitute for it. In particular, decisions should 
never be based on the most frequent spellings of sounds without'. taking into 
account the most frequent pronunciations of spellings. These are not just 
inverted statements of the same fact. Thus the predominant spellings of the 
name-sounds of A, E, IT are the letters a, e, u,but the predominant pronuncia - 
tions of the letters a, e, u are as in bat , bet , but respectively. Similarly, 
the commonest spelling of the phoneme /z/ is the letter s, but the commonest 
pronunciation of the letter s is /s/. 

Data on relative frequency of phonemes and/or graphemes can be invaluable 
both in devising, phonemic codes and in formulating rules and/or exceptions for 
their practical application. I say codes , rather than a code , because as of 
today no one phonemic code for English can conceivably be "best 11 for all pur- 
poses. At the phonemic level, setting aside the precise phonetic notations 
which are the legitimate and valuable tools of the linguistic scholar, tut a 
perplexing mystery to the untrained ear, there are at least three somewhat 
different- purposes to be served by a phonemic notation: 1) As an i.t.m. 

4 * * V * 

(initial teaching medium), the purpose of most immediate interest to usj 

2) For an intermediate stage or stages of phonemic spelling reform of English} 

3) For an ultimate phonemic- spelling reform. 
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Much of the importance of data on relative frequency derives from the > 

problem of compatibility vrith T.O. (traditional orthography). For ari i.t.m, , 

the importance of compatibility in facilitating the all-important transition 

to T.O. is sufficiently obvious.. For an intermediate stage of shelling reform, 

to be used, as Shaw put it, "side by side with the prese. t lettering until the 
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better ousts the worse," the necessity for an essentially "self-reading" 
degree- of compatibility, for one who has never examined the code, is no less 
obvioUs . Even for an ultimate spelling reform, which in the English-speaking 
countries could hardly be imposed by decree, as Kemal Ataturk imposed the Roman 
alphabet on Turkish, compatibility would surely minimize resistance to the 
transition. 

Yet another point at which data on relative frequency make a significant 
contribution is in estimating the possible savings in the writing and printing 
of superfluous letters— 'the aspect on which Shaw again and again laid extrava- 
gant «mphasis.^ For a well-designed phonemic alphabet of the supplementing 
type (one sign, one sound, adding necessary new letters to the present Roman 
alphabet), this saving can run just about 1 letter in 6, or 0170,000,000 out 
of each $1,000,000,000 of writing and printing costs. For the more immediately 
.practicable standardizing (no-new-letter) type, the difference from T.O. will 
be only 1 or 2 % either way, since the necessary new digraphs, chiefly for the 
long vowels and diphthongs, just about offset the saving-, of silent or other- 
wise superfluous letters. 

For the purpose of most immediate interest to us, initial teaching media, 
WES will serve to supply examples of the application of relative frequency 
data to the standardizing type of notation. The supplementing type, of which 
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i.t.a. is the prime exemplar, involves too many subjective judgments as to the 
degree of compatibility of characters not now in the Roman alphabet to be dealt 
with statistically in a paper of this length. As oral presentation of compara- 
tive figures on phonemes and graphemes is not easy to follow, three exhibits 
have been provided: ,r 5Torld English. Spelling (?fES) for better reading” ; the SSA 
(Simpler Spelling Association) Phonemic Alphabet * which most nearly parallels 

the phonemic basis of 7JES; and selected pre-publication figures on relative 

7 " 

frequency of • spellings , to which I have added, for ease of oral presentation, 
figures for percentages of occurrences, rounded off to the nearest 1%. 

If compatibility is to be regarded as the predominant criterion, the Roman 
alphabet letters for about half of the consonant phonemes and most of the short 
vowel phonemes call for no comment. Because of the awkwardness of oral pres- 
entation, the examples discussed will be confined to'aofew of the most diffi- 
cult or controversial decisions, both consonant and vowel: for consonants, 
the th problem, and the treatment of c and £ and s; for vowels, the "u u group 

I 

of phonemes, as in but , full, fool : and an examination of the three principal 
differences between WES as a spelling reform notation and as an i.t.m. 

If only items are considered, the all-too-common practice in the past, 
it appears that the th grapheme is pronounced unvoiced, as in thin . 65% of the 
time, voiced, as in then , only 35% • This leads naturally to assigning the 
familiar th grapheme to the unvoiced phoneme, with the logically cognate but 
uncouth symbol dh Cor the voiced phoneme. If, however, occurrences, the more 
appropriate criterion for reading, be considered, it appears that 90% of all 
occurrences are pronounced with the voiced sound, so that assignment of the 
th grapheme to other than the voiced phoneme is unthinkable. In that case, 
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however, ther9 remains no satisfactory digraph for the unvoiced phoneme. The 



cumbersome but intelligible thh grapheme adopted in YES may be justified to a 
degree by relative frequency data -on two grounds: 1) the phoneme is one' of the 
four least frequent in English, only 0.37??; 2) for native English-speaking 



users, the distinction is virtually unnecessary. In the entire 17,000 differ- 
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ent words of the recent Hanna study, derived chiefly from the 4.5 million 

9 

running words which formed the basis of the Thorndike-Lorge list, there are 

only 6 pairs of words (ether, either; thjrgh, thy.; loath, loathe; mouth*, mouth; 
sheath, sheathe; wreath, wreathe) distinguished phonemically only by surd or 
sonant pronunciation of th; and of these, only one word (either), no pair, 
occurs in my list of commonest words, which includes all those found oftener 
than once in 10,000 running words. 

Use (or non-use) of the grapheme c is bound up with the phonemes /k/ and 
/b/. A/ is spelled c in 64 $ of all occurrences, k in 18$, and 9 other ways 
totaling 18/6. Conversely, however, k is pronounced / k / in all occurrences, 
whereas e is pronounced A/ i» only 72$ of occurrences, /s/ in 28 $. Thus, 
explicitness, as well as the more distinctive form of the letter, obviously 
calls for representing A/ by k. 

A parallel example is the phoneme /j/, which is spelled g in 60$ of all 

r** 

occurrences, j in 26$, and 8 other ways totaling 14$. Again, ho?/ever, j is 
pronounced /j/ in all occurrences, whereas g is pronounced /g/ in 73$ of all 
occurrences, but /j/ in only about 27$,* with 3 other ways totaling less than 
0*5$. Quiite obviously, therefore v explicitness calls for representing /j/ 
by j, and restricting g to /g/— except, of course, for the digraph ng which, 
like any digraph, is regarded as a:; unitary symbol. 
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One more example of the importance of considering pronunciations as well 
as spellings in order to maintain the "self-reading". quality which is one fac- 
tor in compatibility. The phoneme /s/ is spelled s in 75 # of all occurrences, 
c in only 14# , and 7 other ways totaling 11#. The letter s, however, is pro- 
nounced /s/ in only 54# of all occurrences, /z/ in 45#, and 2 other ways total- 
ing 1#. Conversely, the .phoneme /z/ is spelled s in 97# of all occurrences; 

*■ the letter z is pronounced /z/ in 96# of all occurrences. This preponderance 
conclusively calls for representing/z/ by z, leaving s as the explicit repre- 
sentation for /s/. 

Assignment of graphemes for the three vowel phonemes spelled oo iri food . 
good , said flood , is a particularly good example of. the help which relative fre- 
quency data can render. It will be taken for granted that the best available 
graphemes are oo, uu (which does not occur in T.O. but is used in the British 
-New Spelling , and uj for discussion of the reasons for eliminating dual use 
of w as a vowel, or employing some digraph including w, or whatever, would range 
too far outside the scope of this paper. 

The vowel phoneme in food is spelled o in 60# of all occurrences (which 
howevef* includes the preposition to (2924/1, 48#) most commonly pronounced 

with schwa, ou in 19#, oo in 7#, and 15 other ways totaling 11#. Conversely, 

•y 

oo is pronounced /ui/ in 50# of all occurrences, /u / in 45#, /o/ in 3#» and 
/u / in 2#. 

The vowel phoneme in good is spelled u in 24# of all occurrences, ou 
in 21#, 0£ in 15#, o in 15#, and 7 other ways totaling 25#. For Eomparison, 

ou is pronounced /ax / in 38# of ail occurrences, /ui/ in 30#, /u/ in 14#, /u / 

* 

in 14#, /o / in 3#, and /a / in 1#. 
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The vowel phoneme of flood is spelled u in 60% of all occurrences , o in 
14%, ou in 8%, oo in less than 0,5%, and 6 other ways totaling 18%, Conversely, 
u is pronounced /v / in 64% of all occurrences, /u / in 10%, /la / in 8%, and 
5 other ways totaling 18%, 

Correlating the foregoing figures— 

For assignment of the oo grapheme, the fact that 50% of its occurrences 
• are pronounced /ui /, as against 45% pronounced /u /, is hardly conclusive, 

•When however it is noted that the commonest spelling of /ui / is o, and the com- 
monest spelling of / u/ is u, the preponderance of the evidence clearly favors 
assignment of oo to /ui/. Since the predominant spelling of /v / is u, and the 
predominant pronunciation of u is /u /, the traditional assignment- of u to 

"short u» is fully confirmed. This leaves uu as the inevitable and not inappro- 

/u/ r 

priate choice for the phoneme/most commonly referred to, or keyed in diacritic 
notations,, as "short oo." 

Concessions from one sound, one symbol writing 

In principle, the chief distinction between a spelling reform notation 

■ _ ' *• v 

and an i.t.m. lies in striking the balance betwee maximum simplicity (i.e., 

regularity) and maximum compatibility with T.O.^ In practice, relative fre- 
quency data support three major concessions from one symbol for one sound 

i . . 

writing ( not , be it noted, from one sound for one symbol) introduced by i.t,a, 
and paralleled by WES . 

1) Doubled consonants for a single phoneme, where T.O, has doubled con- 
sonants, Of the 21 consonant letters' of the Roman alphabet (counting the semi- 
vowels, w, Jr, h), 6 (h, k, c^, w, x, jr) apparently are not doubled in T.O., and 
2 more ( .1 .i . w ) did not occur in the 100,000 running words which I examined, 

O 




6 



The remaining 13, pins ck (in effect a doubled consonant)— bb, cc, ck, dd, ff . 
£gf 11 . mm . nn , pp . rr . ss . tt . zz — occur 7070/1656 times, of which 9956 repre- 
sent the same phoneme assigned to the corresponding single consonant. In con- 
sequence, retention of these occurrences improves the compatibility of some 

6,900 running words in 100,000, -''and preserves the exact T.O. forms of some 

/ 

/ 

2,000; at th* sa time that it introduces a simple but significant step toward 

/ . . • * 

the eventual transition to T.O. 

• 2) Writing . c for /k/, where T.O. has c for /k/j including cc and ck. 

• • 

The figures for A/ and c, showing /k/ spelled 64^ by c, and c pronounced 72% 
as A/, have already been cited. This concession improves the compatibility 
of some 6,500 words, and preserves the exact T.O. forms of some 1,200; and 
again builds another simple bridge toward the impending transition to T.O. 

• 3) Writing y for the high front unstressed vowel (between /i/ and /S /) 
which Sir James Pitman has aptly named schv.’i . where T.O. writes j for that 
.sound at the end of a word or root. The accompanying exhibits, showing /i / 
spelled 2 in US of occurrences, and jy pronounced /i / (in most cases, schwi), 

. in 61# of all occurrences, speak for themselves. This concession improves the 
compatibility of some 4,066 words, and preserves' the exact T.O. forms of some 

800; again, building toward the transition to T.O. 

. • / . * 

• *••'*•* , . ' ' 

To take full advantage of data on relative* frequency of phonemes and 
% » 
graphemes is a far more intricate problem than these relatively simple and 

straightforward examples might seem - to indicate. For example, nothing has been 

said on the problem of selecting the most suitable digraphs, and only one exam- 

• * 

pie has been given of their assignment. Enough has been said, however, I hope, 
to indicate the importance of the relative frequency aspect in practical _\ 

linguistics • . 

O 
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Consonants 



Spellings of 


phonerr.es 




Pronunciations of eranheraes 




/h / th 

/n / th 

> . h 


12,757*/ 114* 
1,392 / 212 

A 1 

14,153*/ 327* 


902 / 35* 
10 %-/ 65 % 
0 .0 


th 


A/ 

/n/ 


12^757*/ 114* 
1.392 / 212 
34,149*/ 326* 


90% / 35% 
10% / 65% 


A/ c 

• k 

9 others 


'6,403 /1775 
1,854 / 343 
1.753 / 562 
10,010 /2630 


64 % 

18% 

18% 


k 

c 


A/ 

A/ 

A/ 

/s/ 


6,403 /1775 

6,403 /1775 
2,477 / 622 
17 /II 
8,897 /2408 


100% 

72% 

28%- 

0 


/j / g 

3 

8 others 


948/306 
414/Hl 
220 / 75 
1,582 / 492 


60% 

26% 

14% 


3 

g 


/.J/ 

tzj. 

W, 


414/111 

2,616 / 560 
948 / 306 

6 / — 5 

3,570 / 871 


100% 

73% 

27%- 

0 


/s/ s 

c 

7 others 


12,822 /2974 
2,477 / 622 
1.782 / 566 
17,081 /4162 


75% 

14% 

11% 


B '/»/ 

/*/ 
2 others 

r 


12,822 /2974 
10,695 /1902 
136 / 30 

23,653 /4906 

* 


54% 

45% 

1% 


/z/ s 

z 

5 others 


10,695 /1902 
247 / 107 

. 147 /_54 
11,089 /2063 


97% 

2% 

1% 


a /«/ 

2 others 


247 / 107 
_^ / _6 
256 / 113 

% 


96% 

4% 



^Includes the 7,310 / 1 
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Vowels 



Spellings of phonemes ' 

/ui/ 



o 
ou 
oo 
u 

15 others 



3,645*/ 26* 

1,127 / 36 
430 / 88 
161 / 48 
688 / 124 
6,051*/ 322* 



602 

19$ 

7$ 

/3$ 

n$ 



Pronunciations of graphemes 



OO 



/«/ 



430 / 88 
388 / 54 
27 / 6 

17 / 7 

862 /155 



50$ 

45$ 

3$ 

2 $ 



/u/ 


.u 


604 / 171 


24$ 




ou 


546/ 8 


21$ 




• 00. 


. 388 / 54 


• 15$ 




0 


368 / 14 


15$ 


7 others 


671 / 219 
2,577 / 466 


25$ 


A* / 


- u 


3,768 / 797 


60$ 


0 


857 / 104 


14$ 




ou 


527 / 157 


8$ 




00 


17 / 7 


0 


6 others . 


1*124 4-52 
6,273 AllS 


18$- 



OU 



u 



/“/ 

/u/ 

/“/ 

/o/ 

/a/ 



/»/ 

• /lu/ 

others / 



1,422 / 150 
1,127 / 36 
546 / 8 

527 / 157 
117 / 21 

22 /II 

3,761 / 383 

3,768 / 797 
604 / 171 
498 / 186 
1.039 / 279 
5,909 /I433 



38$ 

30$ 

14$ 

H$ 

3$ 

1 $ 



64$ 

10 $ 

8 $ 

18$ 



/*/ • i 

y 

e 

17 others 



20,276 /3S07 
4,100 / 885 
2,833 / 803 
2.074 / ,£67 
29,283 /5962* 



C o n c e 



s s i o n 



69$ 

14$ 

10 $ 

7$ 



/*/. 
■ /*/ 
others 



’*20,276. /3807 
2,107 / 302 
491 /_ioi 



22,874 /4210 



89$ 

9$ 

2 $ 



/r/ 



7 

1 



4 others 



1,507 / 40 67$ 

145 / 36 6 $ 

608 / 174 27$ 

2,260 / 250 



y /!/ 4,100 / 885 61$ 

7 /V/ • % 1,507 / 40 22$ 

/a/ 1,154 / 73 17$ 

* /»/ 1/1 0 

6,762 / 999 



‘includes the preposition to, 2, 924 A, 48$; most commonly pronounced with /» / 
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